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Abstract 

It is usual to rely on the quasi-likelihood methods for deriving statistical methods applied to 
clustered multinomial data with no underlying distribution. Even though extensive literature can 
be encountered for these kind of data sets, there are few investigations to deal with unequal cluster 
sizes. This paper aims to contribute to fill this gap by proposing new estimators for the intracluster 
correlation coefficient. 
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1 Introduction 

When categorical data arise from individuals classified into groups of individuals or cluster of objects, 
the major issue is that observations within a cluster are not independent and the conventional methods 
of inference for multinomial sampling, are inappropriate. The strength of similarity of two observations 
within a cluster is typically measured by the intracluster correlation coefficient, whereas observations 
from separate clusters are regarded as independent. In most situations, the intracluster correlation 
tends to be positive and this induces that the variances of the counts under clustered sampling to 
be greater than the ones under multinomial sampling, namely extra variation with respect to the 
multinomial sampling (for the technical details, see page [3]). This kind of observations are referred 
to as overdispersed multinomial clustered data. Some models in the literature have been considered 
for this type of “complex sampling”. See Altham (1976), Brier (1980), Cohen (1976), Hall (2000), 
Menendez et al. (1995, 1996), Morel and Nagaraj (1993), Neerchal and Morel (1998) and references 
therein. 

A sample of size n > 1 is taken in each of the N independent clusters, 

xW = (x[ £ \...,xW) T , 1 = 1, 
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with realizations in the sampling space X ={1, ...,M}, i.e. 


W=1 ,...,JV, p r (0) = Pt(xW = r) > 0, s = 1, n, r = l,...,M, 

with Y^lLiPr (9) = 1- This distribution, 

p(0) = (pi(0) ,~;PM (0)) T , (IT) 

is assumed to be unknown but belonging to a known family of discrete distributions on X, V = {p ( 6 ) : 
6 € 0}, with 0 C M m ° (Mo < M). In other words, the true value of parameter 6 = (6\, ...,9m 0 ) T > 9o, 
is assumed to be unknown. We denote the number of units in the £-th cluster that are classified into 
the r-th category by 

n 

Yr W =El{r}( xi s e) ), r = 1,M, i=\ ,...,JV, (1.2) 

S=1 

with /{ r |) being equal to 1 if = r and 0 otherwise. Therefore, Y^ 1 + ... + = n, i.e., all 

clusters contain the same number of units, n. In Section [4] a generalization for unequal cluster sizes is 
presented. The M-dimensional vector of cell counts associated with the i- th cluster, 

Y® = (Yl e >,...,Y$f, (1.3) 


is the so-called contingency table. 

In what is to follow, we shall assume that p ( 6) belongs to the general class of log-linear models 
with full column rank M x Mq design matrix W, 


p{6) 


exp{VF0} 
1-M e xp {W6} 1 


(1.4) 


where the A 1 linearly independent column vectors of W, are also linearly independent with respect 
to the M-dimensional vector of ones, 1 m = (1, l) 2 ^- The assumption established by (11.41) is the 

condition needed to define the parametric space of 6 , 0, for log-linear models. 

The assumption about p(9), belonging to the general class of log-linear models, covers important 
models. We are going to clarify this point for a two dimensional log-linear models, undestanding that 
it is easily generalized for any other dimension. If the K-tli cluster’s sample come from a bidimensional 
variable (W, X 2 ) with I and J categories respectively, we have 


(x[%x^)ex = {l ,...,/}x {!,..., J}, 


= 1 

s = 1 


and the single index probability vector (11.11) matches the double index probability vector, in lexico¬ 
graphic order, 


P (9) = (Pll (0) ,P12 (0) , -,PIJ (0)) T , 

Pij (0) = Pr(X 1 = i,X 2 = j), i = 1 ,j = l,...,J, 
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i.e. in this case, we have M = I x J cells. The sample of counts given in (11.21) for each cluster 
l = 1,..., N can be denoted using the double index notation, through 




(1.5) 


S=1 


In this setting, we have a two-way contingency table with I rows and J columns for each cluster, 

\r{t) _ /yW yW y(^)\T 

1 ~ V J 11 i 1 12 > 1 IJ ) > 

corresponding to the cells counts of two variables X\ and X 2 , respectively. The independence model 
between X\ and X 2 is the most important model for two-way contingency tables, defined primarily as 


Pij (0) = Pi. (6)p.j (6), i = j = l,...,J, 

where p t . (6) = J2j=iPij ( 0 ); P»j (#) = J2i=iPij (0)> an d expressed as 

log Pij(0) = u + 0 1 ( i) +6 2 (j), i = j = (1.6) 

in terms of log-linear models, jointly with the restrictions to avoid overparemeterization, 

= Yj® 2 (j) = 0- 
i=l j =1 

For the traditional multinomial log-linear models, the first and second order moments of are 

E[yW] = np(0) and Var[l" w ] = nE p(fl) , 

where 

s p(0) = D p(0) ~ V (^) V (0) T , (1-7) 

and D p ( 0 ) is the diagonal matrix of p(0). In this paper, we shall assume the components of sample 
vectors Y^ to be overdispersed with respect to the model with multinomial sampling, i.e., 

E[yW] = np(0) and Xax[Y^} = 4nS p(0) , (1.8) 

with 

i!) n = 1 + (n - 1) p 2 € (l,n] (1.9) 

referred to as “design effect” and p 2 € (0,1] to as “intracluster correlation coefficient”. Notice that 
dn = 1 would correspond to the multinomial sampling with parameters n and p ( 0 ), which means that 
either the components of X^ are mutually independent ( p 2 = 0) or there is a unique observation 
without possibility of being correlated (n = 1). 

In order to interpret p 2 , we could consider {Yr^\Z r = p r (6)) ~ Bin(n,p r (0)), with Z r being a 
generic latent random variable which models the probability of success for each of the individuals 
associated with with E [Z r \ = p r (6) and Yai[Z r \ = E [Z 2 \ — E 2 [Z r \ has a general shape. Since the 
support of Z r is [0,1], it holds that Z r > Z 2 and so E [Z r ] > E [Z 2 ] or 

E [Z r ] - E 2 [Z r \ > E [Z 2 \ - E 2 [Z r \ 

p r (6) (1 — p r (9)) > Var [Z r \. (1-10) 
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From (11.101) . there exists p 2 € [0,1] such that Var[Z r ] = p 2 p r ( 0 ) (1 — p r (0)) and thus 
E[yW] = E[E[yW| Z r }\ = np r (0), 

Var[Y r W] = E[Var[yW|Z r ]] + Var[E[E \Z r }} = d^np r (0) (1 - p r (0)), 

where = 1 + (n — 1) p 2 . Since the same degree of over dispersion is assumed over the M categories, 
it holds p\ = ■ ■ ■ = p 2 M = p 2 , '(In' 1 = ■ ■ ■ = = $ n , and now 

Var^' 1 ] = , d n np r (0) (1 — p r (0)), r = l,....,M, £ = 1,...,N, 


match the diagonal elements of the inflated variance-covariance matrix given in (11.81) . 

Ann and James (1995) presented an algorithm for generating overdispersed binomial distributions. 
Some examples of distributions for Y^\ with expectation vector and variance-covariance given in 
m, are the following: the Dirichlet-multinomial, the random-clumped multinomial and n-inflated 
multinomial distributions. The Dirichlet-multinomial distribution 


Pr (Y} 1) =y 1 ,...,Y 1 




M 



IXC) UrLl T (.yr + Cp r (0)) 

r(n + c) n^ir(cp r (0)) 


(i.ii) 


where y s € Z+, Y^ =1 y r = n, c = p 2 (1 - p 2 ), ( yi . n . yM ) = ri\/Y\^ =1 yr l - and T(-) denotes the gamma 
function, is due to Mosimann (1962). 

The random-clumped multinomial distribution 


M 


Pr(F^ =y) = Y J Pr (0) Pr (U^ = y ), 


( 1 . 12 ) 


r =1 


where y = (yi, ..., pm) T , Vr € Z + , Vr = n -> U^ r \ r = 1 , ..., M are multinomial random vectors 

C/ (r) ~ M(n, (1 — p)p (0) -I- pe r ), r = 1,..., M, 


p € [0,1] and e r is r-th the unit vector of dimension M (1 in the r-th position and the rest elements 
are zero), is due to Morel and Nagaraj (1993). The n-inflated multinomial distribution 

Pr(F« = y) = (1 - P 2 ) Pr (U = y)+ p 2 ^ =l I {n} (y r )Pr (0) , (1-13) 

where y = (yi, ...,y M ) T , Vr € Z+, Vr = n i 

U ~ M(n,p{G)), 


is due to Cohen (1976) and Altharn (1976). The multinomial distribution, with zero inflation in the 
first M — 1 cells, i.e. n-inflation in the M- th cell 

Pr (yW =y)= w Pr(C7 = y) + (1 - w)I {n} (y r )(n), (1.14) 

for any w € (0,1), cannot be considered in general as a distribution with expectation vector and 
variance-covariance given in (11.81) . but does satisfy both moments for the special case of M = 2, i.e. 
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for the zero-inflated binomial distribution. The details are given in Section IA.1I in the Appendix. 
The advantage of not using the distributional assumption in the model is that we can address the 
estimation problem in a similar way done for the multinomial sampling, if we correct the estimator 
of the variance-variance covariance matrix through an appropriate estimator of the design effect or 
intracluster correlation coefficient. The consistency of both estimators is an important property in 
order to make statistical inference about the goodness of fit or other kind of hypothesis testing. 

Throughout this paper we shall assume at the beginning, that all the contingency tables Y^\ 
1 = 1, have a common sample size, n. This assumption is often violated (e.g., due to missing 

values). The extension of the results from equal cluster sizes to the unequal cluster sizes is not difficult, 
nevertheless, as we are aware, even for the quasi-likelihood methodology, no paper has previously 
provided an explicit expression for a consistent estimator of the design effect (i? n ) or intracluster 
correlation coefficient ( p 2 ). We shall present this extension in Section [Tj 

The contents of this paper are organized as follows. For two-way contingency tables with overdis¬ 
persion, Brier (1980) analyzed the independence model but using a parametrization different from the 
log-linear modeling given in (11.611 . An advantage of using the log-linear model parametrization is that 
the estimation of the interaction parameter could provide some insight on the appropriate log-linear 
model before considering the goodness-of-fit test. In three-way contingency tables with overdispersion, 
the log-linear modeling makes clearly simpler the statistical inference needed for model fitting. Moti¬ 
vated by these facts, the second purpose of this paper is to present a new family of estimators useful 
for log-linear modeling with overdispersion, under the mild assumption that the distribution of the 
contingency tables is not specified but it is suppose to hold (11.811 . These new estimators are the quasi 
minimum divergence estimators. We shall refer them in Section [2j One member of these estimators 
is the so-called quasi maximum likelihood estimator. Their corresponding asymptotic properties are 
also shown. We shall propose in Section [3] new estimators for i? n and p 2 . The assumption of equal 
cluster sizes is generalized to unequal cluster sizes in Section [U In this setting on one hand, a new 
family of consistent estimators of the design effect or intracluster correlation coefficient is provided, 
and on the other hand a new estimator is proposed for the special case of large cluster sizes. Two 
numerical examples illustrate the practical application of the new proposed estimators in Section [5] 
and a simulation study is presented in Section [6] by using distributions for the contingency tables, 
related to (11.111) . (II . 12jl . (|1.13l) . Finally, in Section [7] some concluding remarks are provided. 


2 Quasi minimum ^-divergence estimator for log-linear models with 
complex sampling 


The nonparametric estimator of p ( 6 ) based on N clusters is 


N 




W 


e =i 


( 2 . 1 ) 
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N 


i.e. p = (pi, ...,pm) T , with p r = -4^ Yr^ ■ This global estimator can also be expressed through the 


1= 1 


average of the nonparametric estimators of p (6), based on the i -th cluster, p^ = -Y^\ i = 1, N, 




^=i 


i.e. p W = withpi £) = 4 P' r W , £ = 1,...,1V. 


TV 


In the particular case of with multinomial distribution {d n = !),£= 1,..., IV, Y = Y^ i 


is 


£=i 


also multinomial, Y = (Yj,..., Ym) 1 ~ JY[(nN,p (6)). Obtaining the MLE of 6 consists in maximizing 


Pr (Pi = y 1 ,..., Y M = Vm) = 


nN 
yi • • • vm 


Pi ( 0) V1 X ■ ■ ■ X p M (0) VM 


( 2 . 2 ) 


or equivalently 


log Pr (Pi = yi,..., Y m = Vm) = -nNd Ku llback(p,P (0)) + k, 


where k is a constant independent from the parameter 0, dKuiiback(PiP (9)) is the Kullback-Leibler 
divergence between the probability vectors p and p ( 6 ), i.e., 


M 


d 


Kullback 


(P,P(®)) = ^Pr i°g 


Pr 


r =1 


Pr (9)' 


Therefore the MLE of 9 for the multinomial model is given by the value 0 = 6 (Y) such that 

6 (Y) = argmin d Ku iiback{p,P (0)). (2.3) 

060 

Let (f> a convex function (f> (x), x > 0, such that at x = 1, <f> (1) = 0, <f>' (1) = 0, (1) > 0, at x = 0, 
0 (f> (0/0) = 0 and 0 4> (p/0) = lim p(f> (u) /u. It is well-known that the Kullback-Leibler divergence is a 

u—> oo 

particular case of the so-called phi-divergence measures between the probability vectors p and p(0 ), 
given by 

d < p(p,p(O)) = Y^p r ( 0 )^y^0 yj • ( 2 - 4 ) 

More thoroughly, taking 

4>(x) = x log x — x + 1, (2-5) 

(12.41) is the Kullback-Leibler divergence between p and p(9). For more details about phi-divergence 
measures see Pardo (2006). The phi-divergence based estimators for multinomial log-linear models is 
not new, see for instance Cressie and Pardo (2000, 2003), Cressie et al. (2003), Martin and Pardo 
(2008a, 2008b, 2010, 2011, 2012). 

When p 2 = 0, notice that a unique contingency table associated with one cluster (N = 1), 
Y = Y^ 1 ' J[4(n,p (6)), is enough for having a suitable sample for making asymptotic statisti¬ 

cal inference, since the Weak Law of Large Numbers and the Central Limit Theorem can be applied 
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directly inside the unique cluster, making the number independent observation inside, n, large enough. 
Nevertheless, when p 2 > 0, n is fixed and N must be large enough. The following definition shows 
that the phi-divergences, given in (|2.4lh are also useful for the general case of p 2 > 0. Unlike the 
multinomial sampling ( p 2 = 0), for clustered multinomial log-linear models (p 2 > 0) the knowledge of 
the shape of the moments, given in (11.81) . is only assumed. Since no underlying distribution is being 
assumed and only mild assumptions on the first two moments of a distribution are taken into account, 
the estimator of 9 is termed “quasi minimum ^-divergence estimator” of 9 (in the sequel, QM0E), 
defined for the first time for a more general setting in Vos (1992). 


Definition 2.1 We consider a statistical model verifying \1.8\) . The QM(f>E of 9, 9$ = 9^(Y), is 
defined as 

0<t>( Y ) = argmind0(p,p(0)). (2.6) 

ee© 

where d^ifp^p (0)), the phi-divergence measure between the probability vectors p and p (6), is given by 


From a practical point of view, in order to find the quasi minimum cj) divergence estimator of 9 for 
clustered multinomial log-linear models, we have to solve the following system of equations 


W T Y p(e) D p } ^(9) = 0 Mo , 


(2.7) 


where S p(0) D p(0) = I M -p (9) 1 J M , 


*+(9) = (*i(9),...,*U9)) rj 


*$(9)=p r (j>' 


Pr 


\Pr ( 9) 

This expression arises from considering 

d 


~ Pr {9) (f 


Pr 


Pr {9) 


r = 1,..., M. 


M 


—d^(p lP (9)) = -J2 


dp r {9) vt(9) 


—{ 96g p r (9) 


9 = 1, -,M 0 . 


These equations are nonlinear functions of the unknown parameter, 9. In order to solve these equations 
numerically the Newton-Raphson method is used, in such a way that the (t+l)th-step estimate, \ 
is opined fr0m 8<‘> as 

g(«+‘) = g(» _ GT 1 (fl‘‘V’X, 2 m D-‘ „ **(»“), 


pi 0 *) p(oy) 


where 


— {G r j >! g t h(9)) gh=1 — W Y p{e) D m D-, (e) D m Y p{e) 

= W T (J I M ~ P(0)1 T M ) D-, [e) (. T m ~ 1 mP T {9)) W , 


W 
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with 


g***w-s$*+&*W) 

_ y* ( Pr \ f Pr \ 2 1 dp r (6) dp r (6) ^ d 2 p r (0) 4^(0) 

\Pr{0)) \Pr{0)) Pr{0) 99 g dO h ^ 90 g d0 h p r {6) 

= 1 9p r (0)dpr{G) 

h A ’pm de 9 dQ h ’ 

#(0) = (ff(0),..,4(0)) T 1 

^(g) = (i/'f-B lS]-El - ^(0) r = l M 

A > 9 \p r (0)) Pr(0) A ) ’ 

It is worthwhile of mentioning that the quasi maximum likelihood estimators (QMLE), introduced by 
Wedderburn (1974), are very useful for clustered multinomial models, in particular for multinomial 
log-linear models. The QMLEs of 6 are obtained by solving the system of non-linear equations (12.71) 
with cj) given by (12.5ft . i.e., 

**(0)=p-p(0), 

and the expression of these estimators match the ones of the minimum Kullback-Leibler divergence 
estimators given in (|2.3I) . Hence, the QM</>Es are generalizations of the QMLEs, for clustered multi¬ 
nomial log-linear models. 

Under the assumption that the QM<j)E exists, Newton-Raphson method tends always to converge 
with any initialization or guess of parameters. However, for some samples, such as contingency tables 
with outlying cells or high-dimensional parameters, d^(p,p(6)), can be quite flat and can create 
troubles related to precision, even with no null frequencies. Such problems are related to Hessian 
matrices near to be singular. In addition, if the QMcpE fails to exist, the algorithm does not converge, 
since the Hessian matrix of d<p(p,p(0)) becomes iteratively to being singular. The Newton-Raphson 
method converges in general in few iterations, but is convenient to begin properly the iterative process 
with the weighted least squares method of fitting log-linear models (Grizzle, Strainer and Koch (1969)), 
i.e. taking 

(g) = {X T D ? X )~ 1 X T Dplogp 

with X = (1 Mi W) and removing from it u, the independent term. A count of p r = 0 is problematic, 
so one could set p r = \ • 

In order to ensure an algorithm with proper convergence properties in both cases at the same 
time, with zero frequencies or not, there are several possibilities. Quasi-Newton and conjugate gradi¬ 
ent methods require only evaluating gradients, being quasi-Newton methods faster but more storage 
demanding. For this reason, we have applied in the simulation study the Fortran NAG subroutine 
C05PBF, which is based on a modified version of the Powell’s hybrid algorithm, a combination of 
quasi-Newton and conjugate gradient methods. It is also worth of mentioning that the derivative free 
algorithms (Nelder-Mead, Hooke-Jeeves, Torczon) constitute a robust choice with respect to the initial 
point, and could be particularly useful for contingency tables with outlying cells. 
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Theorem 2.2 Let 0^ be the QMpE for the unknown parameter 6 of the clustered multinomial log- 
linear models, then it holds 

i) 

VN(e^ - 0 O ) AA(Om 0 , ^ {W T ^P{o 0 )Wy l ), (2.8) 

iV —/oo 

a) 

Vn(p( 9^) - p{e o)) AT(Om, ^S p(0o) w (w r s p(0o) w)' 1 w T v pm ), (2.9) 

with 6 o 6emg i/ie /rite and unknown value of 6. 

Proof. The proof is given in Section [A.31 of the Appendix. ■ 


3 Consistent estimator for t? n and p 2 


We consider p and p ( G ) defined in (12.11) and (11.11) respectively. By the Weak Law of Large Numbers, 
it holds 

p -A p(0 o ), 

iV—>■ oo 

and applying the Central Limit Theorem, it follows that 

>/N[p-p (0 O )) Af(0 M , ^fS p(0o) ), (3.1) 

where 5] p (a) was given in »• 

Remark 3.1 Notice that — = y + ^^zfp-p 2 an increasing function of the intracluster correlation, p 2 : 
with p 2 = we obtain ^ /or A: € {2,..., n} and with < p 2 < we obtain < -yf < 

/or k € (2, ...,n}. On t/ie ot/ier /land, i/ f/ie cluster size (n) were large, t(i — p 2 ) would be small, and 
^ = 2.(1 — p 2 ) + / o 2 would become similar to p 2 . 


Now, we shall consider the iV contingency tables expressed jointly in a unique AWL-dimensional 
vector, 

Y = (T (1)t , ..., T (Ar)T ) T , 

and we can define its corresponding vector of probabilities, p, as follows 

i ) — —y 

nN 

In addition, the inter-cluster-level homogeneous version of the probability vector is given by 


r = (jfP T ,..,±p T ) T - ( 3 . 2 ) 


Brier (1980) proposed a consistent estimator of d n based on comparing the discrepancy between p 
and p* in the following way 


N 


X\Y) = ^ (y» - npY ID? (y(0 - up) = n £ £ ^ ~ ^ 


1= 1 


£=1 r=l 


Pr 


(3.3) 
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with D a being the diagonal matrix of vector a. The shape of this estimator reminds the expression 
of the chi-square test-statistic for inter-cluster level homogeneity. 

The following theorem permit us to define estimators for d„ and p 2 through the same expres¬ 
sion proposed by Brier (1980). Nevertheless, these estimators are valid not only for the Dirichlet- 
multinomial distribution given in (11,1111 . as desired by Brier, but also for other distributions with 
over dispersion such as (11.1211 and (11.1311 . For this reason we refer them as the Brier’s estimators. 


Theorem 3.2 For A3.31) divided by (N — 1 )(M — 1), as N tends to infinity, it holds 


X 2 (Y) 


(N — 1)(M — 1) N —>oo 


d. 


X 2 {Y) 

(N—1)(M—1) 

n — 1 


- 1 


N —>-oo 


(3.4) 


Proof. The proof is given in Section IA.2I of the Appendix. ■ 

Since in this paper no specific distribution is assumed, the proof of Theorem 13.21 is completely new 
and more general than the one given in Brier (1980) and is the basis for considering the second of the 
following consistent estimators, for the design effect as well as the intracluster correlation coefficient. 


Definition 3.3 (Nonparametric estimators of d and p 2 ) The Brier’s consistent estimator of the 
design effect, d n , is 


K,n(Y) 


X 2 (Y) 

(N — 1)(M - 1)’ 


(3.5) 


where X 2 (Y) is defined in 113.31) . 
correlation coefficient, p 2 , is 


Similarly, the the Brier’s consistent estimator of the intracluster 



K,n(Y) - 1 
n — 1 


(3.6) 


The estimator for the design effect, d nj ]\r(Y), as well as for the intracluster correlation coefficient, 
p ^ n (Y) are fully non-parametric. Based on the proof of Theorem l3.2l it is possible to give the following 
definition based on the consistent estimator p{0p) of p(0) for a log-linear model with complex sampling, 
with dp being the QM^>E given in ()2.6I1 . This could be a semi-parametric version of the estimator, 
and is proposed for the first time in this paper. 


Definition 3.4 (Semiparametric estimators of d and p 2 ) The parametric extension of the Brier’s 
consistent estimator of d n is 

X 2 {Y,0p) 


d n , N (Y, Op) = 


(N — 1)(M — 1) ’ 


where 


N 


£=1 


* 2 TA) = E A (<) - "*) 


N N M n \ 2 

^ -np)=nY J Y. J 


1= 1 r =1 Pr(Ocf)) 


Similarly, the parametric extension of the Brier’s consistent estimator of p 2 is 

dn,N(Y, Op) - 1 


Pn, N (Y, Op) = 


n — 1 
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4 Generalization for unequal cluster sizes 


4.1 Notation and basic results 

Let us consider G groups of clusters in such a way that all the contingency tables, 


-( 9 /) — 


_ y 

— V 2 ! 5 1 M 


( 9A\T 


= 1 


of the same group of clusters have the same sample size n g , g = and N = J2 g =i^g- It is 

assumed having at least an index g such that n g > 1. If we replace the assumption N —>• 00 by 
N g —7 00 for each group of clusters, then all above stated results hold separately for each group of 
clusters. 

By following (12.11) . the nonparametric estimator of p (0), based on N g clusters, is now given by 


N a 


p(9) = 


n a N a — 
9 9 e=1 




N a 


i.e. p ^ = (pf , ...,pi^) T , with = ^~7r E Y r 9 ’ , r = 1, This global estimator can be also 

99 1= 1 

expressed through the average of the nonparametric estimators of p(6), based on the t-X\\ cluster, 
P M = ^ 


N g 


9 0 — 


— 1 


Ng/) _ J _y{9/) 


P 


n„ 


On the other hand, the nonparametric estimator of p(9), based on G groups of N\, ..., Nq clusters 
with sample size ni, ..., no respectively, is now given by 


G N g 


G 


N a 


G 


E E r (gJ) E Vv 9 J- E 

~ 9=1 £=1 9=1 1=1 Ng) 

P = — - = - 75 - = A W 9P > 


E n 9 N 9 
9=1 


E rigNg 

9=1 


9=1 


where 


G 


W n = 


n « N «—>0, g — 1, G, 


9 G 

E ra/i-Nfc 


(4.1) 


(4.2) 


h =1 


and E = 1- 

9=1 

Through the Central Limit Theorem, similarly to (13.11) . for the g-th group, it follows that 


VK(P {9) -P(Oo)) A AA(O m ,4 E p(®o))> 


(4.3) 


11 








and thus 


E n hN h 
h =1 _ 

/ ngNgtfv 
9=1 


= (P -P ( 0 O)) 


N lt ...,N G -^oo 


■A/’(°M, £p( 0 o ))- 


(4.4) 


See Section [A.41 in the Appendix for the details of the derivation of (14.41) . If in addition, if we assume 
that there exists a sequence {N^}^ =1 , such that 


Nh 

N 


N —>-oo 


N h e (0)1]> 


(B3D can be rewritten as 


where 


and 


such that 


Notice that 


Vn(p -p (0 o)) ^ N{ 0 m , %^S p(0o) ), 


G 

n = ^2 N *n g , 
9=1 
G 

tin* = 

9=1 


W 9 G 


N*n„ 

9 — >0, 5 = 1,-,G, 


E 

h =1 




9 jV->oo ^ 9 


* x~^G * i 

L,=l»o = 1- 


iV = 1 + P 2 (n* - 1) G (1, n*], 

i.e. dHTD represents the over dispersion parameter when the cluster size is 


Ep=wg- 


(4.5) 


(4.6) 

(4.7) 


(4.8) 


(4.9) 


In particular, i? n * = 1 (p 2 = 0 or n\ = ■ ■ ■ = no = 1) represents the case of multinomial sampling. 

It is interesting to be mentioned that Brier (1980, Section 3.4) proposed the unknown parameter 
$ n *, given in (14.71) . for the stronger assumption of Dirichlet-multinomial distribution for Y^ 9,e \ given in 
(11.111) . For this reason, in a future work, a new improved consistent estimator of i} n * could be a useful 
tool to propose appropriate test-statistics for the goodness-of-fit of log-linear models with clustered 
multinomial data under overdispersion. These test-statistics would require a weaker assumption in 
comparison with the Brier’s paper. 
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4.2 Brier’s modified estimators for $ n and p 2 

We shall define a consistent estimator of the design effect and the intracluster correlation coefficient, 
for unequal cluster sizes, as 


G 


$n*,N = ^2 w g$n g ,N g (Y g ), 
9= 1 


~2 

Pn*,N ~ 


where 


n* — 1 


is a consistent estimator of n* given in (14.71) or (14.81) and 

Y g = ((Y^) T ,-,(Y {9 ’ N! >)) T ) T , 

X 2 (Y g ) 


Vn Q ,N q (Y g ) = 


(Ng — 1)(M — 1 ) ’ 


N a M (d.^9) d.9)\2 


£=1 r=l TV 


(4.10) 

(4.11) 


g = 1, ...,G. Both estimators, (14.101) and (14.111) . are consistent estimators since 


$n*,N ~—> fin*, 
iV—>oo 


2 

Pn*,N ~ > P 
’ TV—>-oo 


In addition, focussed on a specific cluster size, notice that $ ng ,n*,N = l + p|* ^(n^ — 1) is an alternative 
consistent estimator of i? n , g = 1,..., G , but it requires from estimators of '0 n * and p 2 , i.e. (14.101) and 
(14.111) respectively. 

G N g _ _ ^ 

For Y = Y^ 9,i \ it is possible to follow Definition 12.II to obtain the QM^E of 6 , 6$ = 0^ (Y) 

g=ie =l 

and also Equation (12.71) replacing properly the expression of p, according to (14.11) . In a similar way 
done for Theorem 12.21 we have 

^( d <P~ d o) (W T S p(eo) W) _1 ) 

and 

VNfaidt) -P (do)) A A7(0m, ^p(B 0 )W {W T Y p(eo) Wy 1 W T Y p{6o) ). (4.12) 


4.3 New non-parametric and semi-parametric estimators for i9 n and p 2 
4.3.1 Case 1: N g > 1, g = 1,..., G 

Let Y = (Y 1 , ...,Y G ) T , be the whole sample with the dimension of Y g being the corresponding 
dimension, N g > 1. Based on the proof of Theorem 13.21 it is possible to propose a new non-parametric 
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consistent estimator of t?„ 9 with a faster convergence level by using 


K g ,Ng(Y g ,Y) = 


X 2 (Y g ,Y) 

{Ng — 1)(M — 1) ’ 


M , Ng 


XHY„,Y) = £ £ - ; Pr) = "»E(T E(^ 9 ’ - #’) 2 > 


£=1 r =1 


Pr 


Vr 

r =1 J r £=1 


rather than X 2 (Y g ) and t? ng ,A r g (Y g ) respectively, g = 1, Moreover, if the log-linear model were 

correctly validated, a new semi-parametric consistent estimator of even with a faster convergence 
degree is given by 


,» ,3 s, * 2 CU,A) 

Vn„N,yi „«*) - (JVj _l )(M-1)’ 

__ iV s M (d.^9) _Ag )\2 

X\Y a ,d t ) = n g J2f2 


1= 1 r=l 


Pr(G<j>) 


= Ur, 


±vi 1 M 

E——E 

r=l Pr(9(i>) £ = i 




g = 1,..., G. Plugging either & ngjNg (Y g ,Y) or $ ng ,N g {Y cpO ^) into Q4.10D in the place of G ng , Ng (Y g ), 
the new consistent estimators of the design effect is obtained, for unequal cluster sizes and based on 
phi-divergences (the intracluster correlation coefficient, (|4.11j) . is similarly computed). 


In the sequel we shall abbreviate by $n g ,N g , 




n g ,N g ,» i '&n g ,Ng,<t>i 


the three versions iL 


,(X 9 


^n g ,N g (Y g ,Y), $n g ,N g {Y g , 0$) respectively, and their corresponding expression for (14.101) . (14.111) . 


Vr. 


•—2 Q Z q Z 

,Ni Pn*,N > Pn*,N,»' ,N,<j>i Pn*,N,<f> 


4.3.2 Case 2: n g large enough and N g > 1, g = 1, ...,G 

When the values of the cluster sizes are large, without any loss of generality can be assumed that 
N g = 1 and G = N. By following Section I4~T1 and taking into account that lirrin^^ = p 2 , 

P {e) ^ M(p(G 0 ),p 2 Y p{eo) ), 

n-e^oo 

which means that p^\ ..., p ^ are (asymptotically, as rip —> oo) i.i.d. M-dimensional random vari¬ 
ables. Taking into account similar arguments as the ones given in Section fA.21 we obtain the following 
consistent estimators of p 2 as n g , N —» oo 


N 


? = 


(N — !) (M — 1) ■“ 


1 ZL 


5=1 




1 " 


s(s) 


N t- 


S =1 


1 


M 1 N / 1 AT 

ij E1E {N - 

’ r =1 ^ £=1 v 


(IV - 1) (M - 
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for the saturated model and 


N 


P 2 (0*) = 


M 


l N 




n,.i 

N 




1 A 


;W 


iV±r 


5=1 


(JV - 1) (M - 1) ^ *(«,) £ 
for the log-linear model. 


p1'> - 


5 Numerical examples 

The following two studies represent respectively the numerical examples for cases 1 and 2 in Section fOl 
Focussed on estimating the the intracluster correlation coefficient, p 2 , the semiparametric consistent 
estimators are considered for case 1, and the non-parametric ones for case 2. The second example 
illustrates that the estimation of the intracluster correlation coefficient corrects the variance we had 
without overdispersion, for using same estimators we had without overdispersion. The corresponding 
Fortran codes are available at http: //sites . google. com/site/nirianmartinswebsite/sof tware. 


5.1 Study on housing satisfaction (Brier, 1980) 

From all the households located in N = 20 neighborhoods around Montevideo (Minnesota, US), some 
households were randomly selected: from N\ = 18 neighborhoods n i = 5 houses were selected and 
from IV 2 = 2 neighborhoods n 2 = 3 houses. The neighborhoods are grouped into class g = 1 or 
g = 2 depending on the selected number of houses (neighborhood or cluster size), n 1 = 5 and 77-2 = 3 
respectively. For the l -th neighborhood (t = 1 ,...,N g ) of the g- th cluster size, in the s-th selected 
home (s = 1, ...,n g ), the family was questioned on two study interests: satisfaction with the housing 
in the neighborhood as a whole (.xj®’^), and satisfaction with their own home (x!f s ’^). For both 
questions the responses were classified as unsatisfied (US), satisfied (S) or very satisfied (US'). In the 
sequel, we shall identify the aforementioned categories of the ordinal variables, X^' (:> and X^’^, with 
numbers 1, 2, and 3: for example, ( US,S ) is associated with (X^\x^^) = (1,2). 

Under the assumption that a family’s classification according to level of personal satisfaction is 
independent of its classification by level of community satisfaction, the log-linear model given in (11.61) 
is considered for a I x J contingency table with I = J = 3. The corresponding data, given in Table 
EU are disaggregated based on the number of houses and neighborhood identifications (g,£) in 20 
rows, having each M = 9 cells in lexicographical order (number of columns). The design matrix and 
the unknown parameter vector are 


/I 1 1 0 0 0 -1 -1 

0 0 0 1 1 1 -1 -1 

10 - 110-11 0 

\o 1 -1 0 1 -1 0 1 


-1\ 

-1 

-1 

-V 


and 6 — (0^, 9 ^ 2 ), 0 2 ( 1 ) j ^ 2 ( 2 ) ) T ■ 
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(gJ) 

(US, US) 

y (p>^) 

2 11 

(US, S) 

y(p>^) 

2 12 

(US, VS) 

2 13 

(S, US) 

y(P»^) 

1 21 

(S,S) 

y(p>^) 

2 22 

(S,VS) 

y(P /) 

2 23 

(VS, US) 

y (p>^) 

2 31 

(VS,S) 

y (p>^) 

2 32 

(VS, US) 

y (p>^) 

2 33 

n g 

(i,i) 

i 

0 

0 

2 

2 

0 

0 

0 

0 

5 

(1,2) 

i 

0 

0 

2 

2 

0 

0 

0 

0 

5 

(1,3) 

0 

2 

0 

0 

2 

0 

0 

1 

0 

5 

(1,4) 

0 

1 

0 

2 

1 

0 

1 

0 

0 

5 

(1,5) 

0 

0 

0 

0 

4 

0 

0 

1 

0 

5 

(1,6) 

1 

0 

0 

3 

1 

0 

0 

0 

0 

5 

(1,7) 

3 

0 

0 

0 

1 

0 

0 

1 

0 

5 

(1,8) 

1 

0 

0 

1 

3 

0 

0 

0 

0 

5 

(1,9) 

3 

0 

0 

0 

0 

0 

1 

0 

1 

5 

(1,10) 

0 

1 

0 

0 

3 

1 

0 

0 

0 

5 

(1,11) 

1 

1 

0 

0 

2 

0 

1 

0 

0 

5 

(1,12) 

0 

1 

0 

4 

0 

0 

0 

0 

0 

5 

(1,13) 

0 

0 

0 

4 

1 

0 

0 

0 

0 

5 

(1,14) 

0 

0 

0 

1 

2 

0 

0 

0 

2 

5 

(1,15) 

2 

0 

0 

2 

1 

0 

0 

0 

0 

5 

(1,16) 

0 

0 

0 

1 

1 

1 

0 

2 

0 

5 

(1,17) 

2 

0 

0 

2 

1 

0 

0 

0 

0 

5 

(1,18) 

2 

0 

0 

2 

0 

0 

1 

0 

0 

5 

(2,1) 

1 

0 

0 

1 

1 

0 

0 

0 

0 

3 

(2,2) 

0 

0 

0 

1 

0 

1 

0 

0 

1 

3 

total 

18 

6 

0 

28 

28 

3 

4 

5 

4 

n = 96 


Table 5.1: Housing satisfaction in 20 neighbourhoods of Montevideo (Brier, 1980). 
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For estimation, the power divergence measures are considered, by restricting cj) from the family of 
convex functions to the subfamily 


4>\{x) = 


A(l+A) L~ ~ ”v~ i '• r- l -i ”J 

l im u—>A [x v+1 ~ X - v(x - 1)] , A G {-1, 0} 


where A G M is a tuning parameter. The expression of (12.41) becomes 


d<P\ (PiP(Q)) 


' M 

A(A+1) (^(0) ~~ Pr ( 0 )) ’ ^{-1,0} 

r =1 

d-KullbackiP (&) iP)i A 1 

_ dKullbacki.Pi P (^))j A 0 


in such a way that for each A G M a different divergence measure is obtained. By following Definition 
12.11 the quasi minimum power-divergence estimator (QMPE) of 9, is given by 9 ( f >x = arg mingg© d r p x {p, p ( 9 )). 
Notice that the case of A = 0 for the QMPE of 9, 9 ^ 0 , match the QMLE of 9, 9, or equivalently the 
QM</>E of 9 with (j) being equal to (12.51) . Under the independence model, the two parameters of interest, 

(3 = ( p2 


\p(0) 


are estimated through 


n *~J 

exp-fWtf^} ) 

J M ex Pl^jl / 

where 'dfi*.N is computed as (14.101) . and 9 r p x as follows. The expression of (0) = (\E fl f A (0), ..., [9)) T , 

for A G M — 1}, is given by 


Pr, 


,N,<p x 


I 


Pn * 


\P( e tx) I 




1 / Pr +1 

1 + A W(6) 



r = 1, ••., M, 


hence the QMPE of 9, 9 C / )X , is obtained by solving 

W T S p(e) D-g +1) (p A+1 - p x+1 (9)) = 0 Mo 


(5.1) 


where A £l-{-l) and 


—(A+l) 


MP) D P (e) 


(p 


A+l_ p A+l (0)) = (j M _ p(0) IT) D 


-A 

p(6>) 


(P A+ 1 -P A+1 W) 


1-P1(Q) 

P?(0) 

P2(0) 

pW) 


PM ( 6 ) 

Pl(0) 


Pl(fl) 

p¥w 

1— P2(fl) 

PP) 


PM iP) 

" p£(0) 


Pl(fl) 

Pm( 0 ) 

P2(0) 

Pm(®) 


1— Pm(0) 
Pm( 0 ) -I 


Pi - Pi (#) 

P2 - P2 (®) 
Pm-PaA0). 
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Since 


= TTa ( Im + a - d ? + 1 - d p(») +1) ) pW- 

the expression of G t j >x (6) is 

(■'„ <») = (r« - p(«)1m) (Jm + Aoi +1 D p( (A+1) ) d p(8) (7 m - 1mp t {0)) W 

= y^y (w T £ p(e) vr + aiv t s p( 8 ) oA + 1 d- ( ( a + 2 ) s p( 8 ) iv) , 

and the Newton-Raphson algorithm for the QMPE of 6 is 


jh+i) 

Ux 


-n {t) 

- 6 <t> x - 


-1 




(t) 


W + A W 1 S 


(t) 


d^ +1 d ( *+ 2) s 

) p _/sw 1 


(t) 


p(sy x ) p( % ) 


w 


x W T S 


y-j —(A+l)/y-A+l „A+1/Ah)\\ 


(5.2) 


Under no model assumption, the two parameters of interest, (3, are estimated through the saturated 
log-linear model 



or 


(3, 


n* ,N,» 



Under the independence model assumption as well as no model assumption, the estimates of (3 are 
shown for A € {—0.5, 0 , 2/3, 1 , 2 } in Table I5l2l The intracluster correlation coefficient exhibits the 
smallest value under no model assumption and under the independence model assumption a set of 
quite different values is obtained. In Section [ 6 j through a simulation study, some guidance is given 
for selecting the most appropriate estimate. The standard errors are also shown based on the square 
root of the diagonal elements of 


Var[p] 

Var[p] 


Vai :[p(00 A )] 




n*,N ' 


(Brier’s non-parametric), 




Nn ^p 

n*,N,<l> X yi ^ 

Nn P(00 A ) 


(improved Brier’s non-parametric), 

-l 


w w T s 


p( 0 <£ A ) 


w 


w t y: 


'p{ e <t> x Y 


Taking into account the asymptotic normality of p and p(d^ x ), their corresponding confidence inter¬ 
vals, with 1 —a level, could be calculated as PrT~a/ 2 Var[p r ] or p r .( 0 ^ A )=Fz a / 2 Var[p r ( 00 A )], r = 1 , ...,M. 


5.2 Study on FBI data (Weir and Hill, 2002) 

In an FBI Laboratory Division Publication, article by Budowle and Moretti (1999), genotype profile 
data were electronically published. Based on six US subpopulations, allele frequencies were reported 
for 13 commonly-used forensic loci in the Combined DNA Index System (CODIS): D3S1358, vWA, 
FGA, D8S1179, D21S11, D18S51, D5S818, D13S317, D7S820, CSF1PO, TPOX, THOl and D16S539. 
For the first four loci, allele frequencies are summarized in Tables 153115.61 based on six clusters, African 
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No model (Brier’s non-parametric) 


Pi 

P2 

P3 

Pa 

P5 

P6 

P7 

Ps 

P9 

Pn*,N 


0.1875 

0.0625 

0.0000 

0.2917 

0.2917 

0.0313 

0.0417 

0.0521 

0.0417 

0.0172 


(0.0411) 

(0.0255) 

(0.0000) 

(0.0479) (0.0479) (0.0183) (0.0210) (0.0234) 

(0.0210) 


No model (improved Brier’s non-parametric) 


Pi 

P2 

f>3 

Pa 

P5 

P6 

P7 

Ps 

P9 

—2 

Pn*,N-» 


0.1875 

0.0625 

0.0000 

0.2917 

0.2917 

0.0313 

0.0417 

0.0521 

0.0417 

0.0199 


(0.0413) 

(0.0256) 

(0.0000) 

(0.0481) (0.0481) (0.0184) (0.0212) (0.0235) 

(0.0212) 


Independence model 

A 


P2(0<j >x ) 

P3(0<fix) 

PA(0<t>\) 

P5(0<t>\ ) 

Pe( d 4>x) 

P7(0<t>x) 

Ps(0<t> x ) 

P9(0<t>x) 

—2 

Pft*,N,4> x 

-0.5 

0.1274 

0.1001 

0.0113 

0.3412 

0.2682 

0.0302 

0.0649 

0.0510 

0.0057 

0.3109 


(0.0387) 

(0.0323) 

(0.0082) 

(0.0617) 

(0.0564) 

(0.0207) 

(0.0278) 

(0.0226) 

(0.0045) 


0 

0.1302 

0.1016 

0.0182 

0.3201 

0.2497 

0.0448 

0.0705 

0.0550 

0.0099 

0.1545 


(0.0331) 

(0.0276) 

(0.0093) 

(0.0512) 

(0.0464) 

(0.0210) 

(0.0245) 

(0.0198) 

(0.0055) 


2/3 

0.1316 

0.1027 

0.0252 

0.3004 

0.2345 

0.0575 

0.0751 

0.0586 

0.0144 

0.0872 


(0.0303) 

(0.0253) 

(0.0103) 

(0.0456) 

(0.0411) 

(0.0214) 

(0.0229) 

(0.0186) 

(0.0066) 


1 

0.1319 

0.1033 

0.0280 

0.2931 

0.2296 

0.0622 

0.0761 

0.0596 

0.0162 

0.0712 


(0.0296) 

(0.0248) 

(0.0108) 

(0.0440) 

(0.0397) 

(0.0216) 

(0.0225) 

(0.0183) 

(0.0070) 


2 

0.1322 

0.1054 

0.0346 

0.2771 

0.2209 

0.0725 

0.0765 

0.0610 

0.0200 

0.0477 


(0.0283) 

(0.0241) 

(0.0118) 

(0.0414) 

(0.0374) 

(0.0222) 

(0.0215) 

(0.0178) 

(0.0078) 



Table 5.2: Estimates and standard errors (in brackets) of probabilities and intracluster correlation 
coefficient: non-paramatric and semiparametric versions for the independence model. 


e 

y W 
X 1 

yW 

vW 

yW 

yW 

r 5 

yW 

J 6 

yW 

yW 

1 8 

n e 

1 

i 

5 

37 

86 

99 

62 

19 

2 

311 

2 

0 

0 

53 

85 

85 

79 

63 

2 

367 

3 

0 

1 

28 

150 

100 

49 

33 

6 

367 

4 

0 

0 

21 

88 

96 

59 

19 

1 

284 

5 

2 

5 

19 

95 

81 

64 

15 

2 

283 

6 

0 

0 

8 

48 

42 

32 

17 

0 

147 


Table 5.3: Integer-valued alleles for D3S1358 loci desagregated by US subpopulations: 12, 13, 14, 15, 
16, 17, 18, 19 (M=8). 
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t 

yW 

y W 

vW 

yW 

M 

yW 

r 5 

yW 

1 6 

yW 

x 7 

yW 

*8 

y W 

yW 

MO 


1 

0 

2 

21 

76 

84 

60 

37 

22 

9 

0 

311 

2 

0 

1 

35 

41 

78 

97 

79 

32 

4 

0 

367 

3 

1 

19 

25 

127 

89 

73 

28 

5 

0 

0 

367 

4 

3 

8 

16 

43 

74 

59 

51 

23 

7 

0 

284 

5 

1 

1 

19 

62 

81 

53 

42 

15 

7 

2 

283 

6 

1 

1 

13 

18 

44 

39 

21 

7 

3 

0 

147 


Table 5.4: Integer-valued alleles for vWA loci desagregated by US subpopulations: 11, 13, 14, 15, 16, 
17, 18, 19, 20, 21 (M=10). 


£ 

yW 

M 

yW 

*2 

yW 

yW 

yW 

yW 

y W 
X 7 

yW 

x 8 

yW 

x 9 

yW 

M0 

y W 
Ml 

yW 
1 12 

yW 

M3 

n e 

1 

3 

16 

25 

38 

74 

36 

59 

33 

12 

7 

6 

1 

i 

311 

2 

12 

19 

54 

65 

68 

58 

54 

26 

7 

4 

0 

0 

0 

367 

3 

1 

30 

27 

45 

67 

52 

44 

55 

32 

13 

1 

0 

0 

367 

4 

0 

17 

22 

31 

42 

51 

60 

30 

10 

16 

2 

2 

0 

283 

5 

1 

19 

15 

18 

61 

61 

42 

32 

9 

16 

6 

3 

0 

283 

6 

2 

8 

14 

15 

25 

23 

30 

17 

6 

3 

2 

1 

1 

147 


Table 5.5: Integer-valued alleles for FGA loci desagregated by US subpopulations: 18, 19, 20, 21, 22, 
23, 24, 25, 26, 27, 28, 29, 30 (M=13). 


l 

yW 

M 

yW 

yW 

J 3 

yW 

x 4 

yW 

yW 

J 6 

yW 

M 

y W 
X 8 

yW 

J 9 

yW 

M0 

yW 

Ml 

n<> 

1 

i 

2 

6 

12 

32 

72 

104 

65 

14 

3 

0 

311 

2 

7 

4 

38 

19 

53 

127 

75 

40 

3 

1 

0 

367 

3 

1 

1 

34 

24 

41 

117 

90 

46 

10 

3 

0 

367 

4 

0 

1 

6 

16 

33 

54 

93 

55 

19 

7 

0 

284 

5 

0 

2 

3 

11 

32 

60 

89 

59 

25 

1 

1 

283 

6 

1 

0 

7 

11 

22 

35 

36 

26 

9 

0 

0 

147 


Table 5.6: Integer-valued alleles for D8S1179 loci desagregated by US subpopulations: 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18 (M=ll). 
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Americans (1), U.S. Caucasians (2), Hispanics (3), Bahamians (4), Jamaicans (5), and Trinidadians 

( 6 ). 

Weir and Hill (2002) proposed estimating p 2 from just the first and second moments of the allele 
frequency distribution, and this is the essence of their so-called method of moments 


M 


£ (MSP r 

r=1 


MSG r ) 


M ’ 

£ (MSPr + (fj- 1 )MSG r ) 

r =1 


(5.3) 


where 

i N 

MSPr = £3iE n ^ - ^) 2 ’ 

L i=i 

i N 

MSG r = —^ -- J>,#(1 -#)■ 

£<=i n t -N t=1 

In Table [ATI the estimates of p and p 2 are shown for loci D3S1358, vWA, FGA and D8S1179. The 
intracluster correlation coefficient exhibits a similar value for both methods, the new proposed esti¬ 
mation of Section [4.3.21 ( p 2 ) and Weir and Hill estimation (p 2 ). The standard errors are also shown 
based on the square root of the diagonal elements of 

Var (p) = (with overdispersion), 

Var(p) = —Sg (without overdispersion). 

2^=i n e 

In the case with overdispersion the standard errors have bigger values than in the case without overdis¬ 
persion (el>e2). The explanation to this difference is based on the correction that p 2 inherits for having 
large cluster sizes and in the case without overdispersion the formula does not inherits the assumption 
that the cluster sizes are large. 
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loci 


Pi 

P2 

P3 

P4 

P5 

P6 

P7 

P8 

P9 

Pio 

Pn 

Pl2 

Pl3 

D3S1358 est 

0.0017 

0.0063 

0.0944 

0.3138 

0.2860 

0.1961 

0.0944 

0.0074 

— 

— 

— 

— 

— 


el 

0.0010 

0.0019 

0.0070 

0.0111 

0.0108 

0.0095 

0.0070 

0.0020 

— 

— 

— 

— 

— 


e 2 

0.0007 

0.0014 

0.0051 

0.0081 

0.0079 

0.0069 

0.0051 

0.0015 

— 

— 

— 

— 

— 

vWA 

est 

0.0034 

0.0182 

0.0733 

0.2086 

0.2558 

0.2166 

0.1467 

0.0591 

0.0171 

0.0011 

— 

— 

— 


el 

0.0014 

0.0032 

0.0062 

0.0097 

0.0104 

0.0098 

0.0084 

0.0056 

0.0031 

0.0008 

— 

— 

— 


e 2 

0.0011 

0.0026 

0.0050 

0.0078 

0.0084 

0.0079 

0.0068 

0.0045 

0.0025 

0.0006 

— 

— 

— 

FGA 

est 

0.0108 

0.0620 

0.0893 

0.1206 

0.1917 

0.1598 

0.1644 

0.1098 

0.0432 

0.0336 

0.0097 

0.0040 0.0011 


el 

0.0025 

0.0058 

0.0068 

0.0078 

0.0094 

0.0087 

0.0088 

0.0075 

0.0049 

0.0043 

0.0023 

0.0015 0.0008 


e 2 

0.0016 

0.0038 

0.0045 

0.0052 

0.0062 

0.0058 

0.0059 

0.0050 

0.0032 

0.0029 

0.0016 

0.0010 0.0005 

D8S1179 est 

0.0057 

0.0057 

0.0534 

0.0529 

0.1211 

0.2644 

0.2769 

0.1654 

0.0455 

0.0085 

0.0006 

— 

— 


el 

0.0018 

0.0018 

0.0054 

0.0053 

0.0078 

0.0105 

0.0107 

0.0089 

0.0050 

0.0022 

0.0006 

— 

— 


e 2 

0.0014 

0.0014 

0.0040 

0.0040 

0.0059 

0.0079 

0.0080 

0.0067 

0.0037 

0.0017 

0.0004 

- 

- 


loci p 2 p 2 


D3S1358 est 0.0109 0.0109 
vWA est 0.0133 0.0156 
FGA est 0.0090 0.0065 
D8S1179 est 0.0116 0.0129 


Table 5.7: Estimates of p and p 2 for loci D3S1358, vWA, FGA and D8S1179 and standard errors of 
probabilities without (el) and with (e 2 ) overdispersion. 


6 Simulation study 

The major issue of interest of this section is to investigate, through Monte Carlo simulations, the 
improvement of the new estimators of the intracluster correlation coefficient, p 2 , based on X 2 (Y gi Y) 
or X 2 (Y g , # 0 A ), in comparison with either the Brier’s classical one, based on X 2 (Y g ) (see Section 
HD, or the Weir and Hill’s proposal (see Section 15.21) . Such an improvement is measured through 
R = 15, 000 replications, in terms of the root of the mean square error (RMSE) and bias. The 
estimates are truncated at 0 or 1 , to restrict the parameter space of p 2 to ( 0 , 1 ). As underlying unknown 
distributions, three scenarios are taken into account: the Dirichlet-multinomial (DM), the n-inflated 
multinomial (NI) and the random clumped (RC) distributions. In Appendix IA.5I the algorithms to 
generate observations from these distributions are provided. Initially, we tried to use the drnbet 
fortran IMSL subroutine to generate the DM distributions and we saw that it does not generate 
observations correctly from the beta distribution. Later, we discovered that Ahn and James (1995) 
had the same problem, and for this reason we have used the G05FEF fortran NAG subroutine. 
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6.1 Simulation: study on housing satisfaction 

Based on a mild modification of the study of housing satisfaction (Section l5.1j) . N\ = 18, IV 2 = 2, IV 3 = 
5 clusters are considered with G = 3 different cluster sizes, n\ = 5, 712 = 3, n$ = 7. In this way, the 
experiment can be evaluated for a value G not so close to G = 1 (equal cluster sizes). With theoretical 
values for the vector of unknown parameters 6 = (0 1(1 ), #i( 2 )> $ 2 ( 1 )) ® 2 ( 2 )) T = (0.1, 0.2,0.4, 0.3) T , the 
clustered multinomial distributions are simulated under the independence log-linear model of Section 

O 

In Figure [Q the plots on left hand side exhibit a greater value going up, for the three distribution, 
which means that RMSE(p|, jv 0 a ) < RMSE(p|» N m ) < RMSE(p|* N ) with A = |. A big part of the 
RMSE is due to bias, in fact bias(p|* i7V ^ ) < bias(p|, )iV -.) < bias(/o |*, N ) with A = | and for /o|» )A r, 0 A 
with A = | and /5|* N . the negative bias is becoming greater as p 2 increases. Identifying the proper 
log-linear model makes the bias of N ^ with A = | clearly smaller and stable as p 2 increases. The 
estimators were constructed under no distributional assumption but from the simulation study, but 
the behaviour of the estimators are appreciated to be quite different depending on the distributional 
assumption. It is also worth of being mentioned that the RMSE and the bias of the estimors of 
p 2 tends to be smaller with the DM and RC distributions in comparison with the NI distribution. 
The estimators with the DM distribution seem to be more precise and the estimators with the RC 
distribution less biased. In Figure EJ density plots based on the 15,000 replications are shown for 
p 2 = 0.5, and from them the same conclusions about the bias are obtained. By following the results 
of Figure [H where RMSE and the bias of N ^ is compared for A € {—0.5,0, |, 1,2}, the QMPE 
with A £ {|, 1} tends to be more precise than the QMLE (A = 0), however the QMLE (A = 0) seems 
to be more unbiased. The optimal choice of A for p|* N ^ seems to be very related with the optimal 
choice of of A for for the QMPE of 6. 
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RMSE RMSE RMSE 


Briers's estimator of p 2 Briers's estimator of p 2 




Briers's modified estimator of p 2 




DM Nl Rc| 


Briers's estimator of p 2 with A=2/3 Briers's estimator of p 2 with A=2/3 




DM Nl RC| | DM Nl - RC | 


Figure 1: RMSE and bias for different estimators of p 2 : p|* N (top), jo|» Na (middle), p|* N x with 
A = 2/3 (bottom). 
































Density 


0.0 0.2 0.4 0.6 0.8 1.0 



0.0 0.2 0.4 0.6 0.8 1.0 


0.0 0.2 0.4 0.6 0.8 1.0 


P 


2 


Figure 2: Density plots with estimates obtained from observations of three distributions, DM, NI, RC, 
when p 2 = 0.5: N (below, Br), /5|* N (middle, Br Mod), jo|. NX with A = 2/3 (top, lambda) 
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RMSE RMSE RMSE 


Briers's estimator of p 2 with DM 


0.08 


Briers's estimator of p 2 with DM 


0.16 



0 - 1 - 1 - 1 -■- 

0 0.2 0.4 0.6 0.8 


Briers's estimator of p 2 with RC 



Briers's estimator of p 2 with RC 



A=-0.5 

A=0 

A=2/3 

A=1 

A=2 


A=-0.5 

A=0 

A=2/3 

A=1 

A=2 


A=-0.5 

A=0 

A=2/3 

A=1 

A=2 



Figure 3: RMSE and bias of p|* N A with different values of A for DM (top), NI (middle), RC (bottom) 
distributions. 






































6.2 Study on FBI data (Weir and Hill, 2002) 

Based on the FBI data study (Section 15. 2|h with theoretical values obtained from the estimates of 
the probability vectors given in Table [5771 for loci D3S1358, vWA, FGA and D8S1179, the clustered 
multinomial distributions are studied under no underlying assumption (saturated log-linear model). 
Through Monte Carlo simulations, the RMSE and bias of the new estimator proposed in Section [4.3.21 
(p 2 ) and the Weir and Hill estimator ( J > 2 ) are compared in Figures HI O [H [71 focused respectively on 
the loci D3S1358, vWA, FGA and D8S1179. Since these kind of data have usually small values of the 
intracluster correlation coefficient, p 2 , the study is only focussed on p 2 E (0,0.1). Except for the RC 
distribution, the bias of p 2 tends to be greater than the bias of p 2 , however, the RMSE of p 2 tends to 
be smaller than the RMSE of ~p 2 . This weakness of the bias could be improved in case of being able 
to identify an apropriate log-linear model. 
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/> 2 's estimator: DM distribution p 2 's estimator: DM distribution 




p 2 's estimator: Nl distribution p 2 's estimator: Nl distribution 






Figure 4: RMSE and bias of the Brier adapted p 2 and Weir’s p 2 for small values of p 2 when DM, Nl 
and RC distributions are considered and the theoretical probabilities are equal to the estimates for 
locus D3S1358. 


























/> 2 's estimator: DM distribution p 2 's estimator: DM distribution 




p 2 's estimator: Nl distribution p 2 's estimator: Nl distribution 






Figure 5: RMSE and bias of the Brier adapted p 2 and Weir’s p 2 for small values of p 2 when DM, Nl 
and RC distributions are considered and the theoretical probabilities are equal to the estimates for 
locus vWA. 



























/> 2 's estimator: DM distribution p 2 's estimator: DM distribution 




p 2 's estimator: Nl distribution p 2 's estimator: Nl distribution 




P 2 ' s estimator: RC distribution p 2 's estimator: RC distribution 




Figure 6: RMSE and bias of the Brier adapted 'p 2 and Weir’s p 2 for small values of p 2 when DM, Nl 
and RC distributions are considered and the theoretical probabilities are equal to the estimates for 
locus FGA. 



























/> 2 's estimator: DM distribution p 2 's estimator: DM distribution 




p 2 's estimator: Nl distribution p 2 's estimator: Nl distribution 






Figure 7: RMSE and bias of the Brier adapted p 2 and Weir’s p 2 for small values of p 2 when DM, Nl 
and RC distributions are considered and the theoretical probabilities are equal to the estimates for 
locus D8S1179. 31 



























7 Concluding remarks 


This paper deals with log-linear models for studying the intracluster correlation coefficient in clustered 
multinomial data. As no distributional assumption is made, only the first two moment assumptions 
are considered, quasi-likelihood methods are followed. With the saturated log-linear model the non- 
parametric estimators of the intracluster correlation coefficient are considered, and the semi-parametric 
estimators arise for general log-linear models. New estimators are proposed for log-linear modeling in 
overdispersed clustered multinomial data with unequal cluster sizes, valid either in a non-paramateric 
and semi-parametric setting. Big differences are found in the Monte-Carlo simulation study, when 
comparing the root of the mean square error and the bias of the new estimators of the intracluster 
correlation coefficient with the clasical ones. In addition, quasi minimum (^-divergence estimators are 
proposed and from the Monte Carlo experiments we saw that it is possible to decrease the root of the 
mean square error in comparison with the quasi-maximum likelihood estimators. These results of this 
paper could be extended for any generalized linear model and the new estimators are promising to 
improve the quality of the goodness-of-fit test statistics for log-linear models in overdispersed clustered 
multinomial data. 

The referees suggested to us to consider the interesting problems related to mising values as well 
as to get the standard errors of pi N . We know that the problem of missing values has been very 
well solved in Chapter 2 of the PhD thesis of Raim (2014) for the random clumped distribution. We 
think that the problem associated to missing data using the modelization given in this paper, applying 
log-linear models, requires a separate paper. The standard errors of p^ N requires also a paper in the 
line of the paper of Weir and Hill (2002). 

Acknowledgement. We would like to thank the referees for their helpful comments and sugges¬ 
tions. This research is supported by the Spanish Grant MTM2012-33740 from Ministerio de Economia 
y Competitividad. 
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A Appendix 

A.l Zero-inflated binomial distribution 


The binomial distribution with zero inflation in the first cell, i.e. n-inflation in the second cell, is given 
by 

7i(0f 

V = v ) = 

[ ne 2 , if v = 0, with Pr(V = 0) = 1 — w 

Its first order moment vector is given by 


M ( n, ( ) ) , if v = 1, with Pr(V = 1) = w 


E 


Y*1 Y 

= E 

E 

YiA 

V 

V 2 ) 



v 2 ) 



= E 


= n 


fn) 

V = 1 

\*2 / 



Pr (V = 1) + E [ne 2 \ V = 0] Pr (V = 0) 


wpi(O) 


1 - wp i(0). 

The derivation for the the second order moment matrix calculation is given by 


E 


Var 

' t Yx \ 

V 

= Var 

' (v\ 

V = 1 


\y 2 ) 



[y 2 J 



Pr (V = 1) + Var [ne 2 \ V = 0] Pr (V = 0) 


= Var 


M n, 


(Pi(0)\ 
= mupi(G) (1 -pi(0)) 


w 


1 -1 

-1 1 
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-E 
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V = 1 


w + E 


E 1 
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V = 0 


(! ~w) 
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and hence 


Var 


= E 


Var 


V 


+ Var 


E 


= nwpi(6) [(1 -pi{9)) + n( 1 - w)p 1 {6)] 
= nwpi(0)(l — u?pi(0))(l + p 2 (n — 1)) 


V 


1 -1 

-1 1 

1 -1 

-1 1 


where 


2 (l-w)p 1 (G) wn-n 

P = —-77TT) i° r any w € (U, 1). 


1 — wpi(6) 

This result matches the one given in Morel and Neerchal (2012, page 83). Let 


(Y \V = v) = 


M ( n,p(0 )), if v = 1, with Pr(F = 1) = w 
neM, if v = 0, with Pr(y = 0) = 1 — w 


be the multinomial distribution with zero inflation in the first M — 1 cells, i.e. n-inflation in the M -th 
cell. 

For M > 3, a univariate homogeneous intraclass correlation coefficient, p 2 , seems not to be an 
appropriate measure to characterize the variability of this distribution, since the intraclass correlation 
along the cells seems to be heterogeous. The reason for this is that for M > 3 there is not an expression 
for the variance-covariance matrix of the multinomial distribution defined as a matrix not depending 
on parameters multiplied by a scalar with all the information about the parameters of the distribution. 

A.2 Proof of Theorem 13.21 

Let 


Sy = 


N — 


N T 

_ (VW _ (yW - np) , 


t= l 
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the matrix of quasi-variances and quasi-covariances of the simple random sample Y^ and 


(S\ 


Sy = diag(Sy) = 


Yi 


S Ym/ 


S 2 y r = 


1 


N — 1 


i=\ 


It is well-known that each diagonal element of Sy is a consistent estimator of each diagonal element 
of 79 n nS p ( 0 ), i.e. 


E [Sy] = diag{E [Sy]} = diag{Var[Y (f) ]} = diag{4nS p(0 )}, 


and 


Sy 'd n np r (d) (1 - Pr{0)) , r = 1, M, 

iV—>oo 

_ p 

or Sy —> diag(i9 n nS p ( 0 \). 

N^r-oo v ' 


It is not difficult to establish that 

M 


M N T 

trac e(S Y ) = Y S Y r = trace(SV) = ^ _ 1 Y ( rtf) “ n P ) “ 

r=l 1 = 1 


np 


(A.l) 


(A.2) 


which is consistent for trace(i? n n5]p(e)) = ^n^YlrLiPr^) (1 ~Pr(G))- We know that the chi-square 
test-statistic X 2 (Y ), given in (13.31) . has an asymptotic X(jv_i)(Af-i) distribution for fixed values of 
number of clusters N and an increasing cluster size, n, under the assumption of inter-cluster level 
homogeneity. However, this distribution is not a useful device for the proof. Based on the expression 
of the chi-square test-statistic, X 2 (Y), in terms of the variance-covariance matrix, as well as the same 
steps to obtain the expression and consistency of (1A.2I) . we are going to establish (13.4|) . We have 


trac e(SyiD p J ) ) 


N - 


N T 
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and 
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trac z{Sy\D p \ e) ) 


= traceE 




= trace ( E [Sy] ^D p( l $) 
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p(0) 


= d n [trace(Jjw) - trace(p(0)l^)] = i?„(M - 1). 


Hence, 
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and taking into account that p is a consistent estimator of p{9), as N —> oo, as well as (1A.1D 


M^ ,r “ (Sl 'i D p 1 ) = (JV - l)(Af - 1) 
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tends in probability to $ n , as IV —>• oo. In other words, 


X 2 (Y) 


1 




»t- p - 

Yr 


9 n n 


M 
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In addition, taking into account (11.91) . the right hand size of (|3.4I) follows. Finally, we like to mention 
that even though X 2 {Y) and $ n (IV — 1 )(M — 1) have the same expectation for a fixed value of N, 
this proof is not trivial since 9 n (N — 1 )(M — 1) as well as X 2 (Y) tend to infinite as IV —>• oo. 


A. 3 Proof of Theorem 12.21 

By applying the Central Limit Theorem it holds (13.11) . Hence, from Pardo (2006, formula (7.10)), for 
the minimum phi-divergence estimator of 9 of a log-linear model it holds 

VN(9^ - 9 0 ) = (W T S p(fl0) W) -1 W T Y, vm D p l eo) VN (p - p (9 0 )) + o p (l Mo ) , (A.3) 

and the variance-covariance matrix of \/N(9 c j > — 9 o) is 

£ ('r T s P <»„ ) w)- 1 w T Y.^D-^ piv D-^ rm w (w^^wy 1 
= lf {W T S rVa) wy 1 . (A.4) 

The last equality comes from 

^p(0o)-^p(0 o )^p( 0 o) = S P(0O ). 

From the Taylor expansion of p(9 ( / > ) around p(9 o) we obtain 

y/N(p(9 ( f ) ) -p(0 o )) = Y p ( 0o )W'/N(9 (/> - 9 0 ) + o p (1m), (A.5) 

and the variance-covariance matrix of yfN{p{9 ( j ) ) — p(9q)) is 

^Y p{eo) W (W T S p(flo) W)" 1 W T Y p[0o) . (A-6) 

Since (p — p (0q)) is normal and centred, from (|A.3l) and (|A.4[) . (12.81) is obtained. Similarly, since 
VNipt - 9 0 ) is normal and centred, from (IA.5I) and (IA.6I) . (12.91) is obtained. 
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A.4 Derivation of Formula (14. 4 j) 


G 


Multiplying (JOJ) by y/N~ g n g / Y n h^h 


h =1 




hence summing up from g = 1 to G and by the independence of clusters 
G 


Y u, s(p (9) - P ( 0 o)) = (p - P (Oo)) 


9=1 


N g —too, g =l,...,G 




(E Un h N h y 

Finally multiplying the previous expression by Yh=\ n h^h j \J Yl^= i n g^g^n g , the desired expression 


is obtained. 


A.5 Algorithms for Dirichlet-multinomial, n-inflated and random-clumped distri¬ 

butions 

The usual parameters of the M-dimensional random variable Y = with Dirichlet- 

multinomial distribution are a = (au,...,ajifi) r , where a r i = p r (0), r = 1 For con- 

/ p 2 \ P T 

venience it is considered with parameters (3 = . , p (6) = (pi ( 6 ), ...,pm (#)) , and is generated 

\P{0)J 

as follows: 

STEP 1. Generate B\ ~ Beta(au, au), with an = p\ ( 6 ), a\ 2 = ~r-(l — Pi (#))• 

STEP 2. Generate {Y\\B\ = &i) ~ Bin(n, b\). 

STEP 3. For r = 2,...,M-l do: 

1 2 2 ( r 

Generate B r ~ Beta(a rl , a r2 ) , with a r i = '-p r (0), a r2 = X ~ z ^~ ( 1 - Y Ph (6) 

p H \ h =l 

/ r —1 

Generate (Y r |Y = yi, ...,Y r _ 1 = y r _i,B r = b r ) ~ Bin In- Y Uh,b r 

\ h =l 

M—l 

STEP 4. Do (Y m \Yi = yi, ...,Y M -i = Vm-i) = n - Y Vh- 

h= l 

The random variable Y = (Yi, Ym) T of the n-inflated multinomial distribution with parameters 
(3, p(0), is generated as follows: 

STEP 1. Generate V ~ Ber(p 2 ). 

STEP 2. Generate 

M(n,p (6)), if v = 0 


(Y \V = v) = 


nM.{\,p{0)), if v = 1 


The random variable Y = (Yi, ...,Ym) t of the random clumped distribution with parameters (3, 
p(0)> is generated as follows: 
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STEP 1. Generate Yq = (Yoi, Yqm) T ~ M{l,p (6)). 

STEP 2. Generate K\^Bin(n,p). 

STEP 3. Generate (Y\\K\ = k\) = ((In, Y\m) T \Ki = ^l) ~ M(n — ki,p (0)). 

STEP 4. Do (Y\Ki = h) = Y 0 k x + {Xi\ K i = h). 

For the details about the equivalence of this algorithm and (11.121) . see Morel and Nagaraj (1993). 

It is interesting to note that there exists the package ’’Modeling overdispersion in i?” useful to 
generate the distributions considered in this Appendix. For more details see Raim et al (2015). 
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